Cumulative 6-Year Risk of Screen-Detected Ductal Carcinoma In Situ by Screening Frequency

Key Points Question Does cumulative risk of screen-detected ductal carcinoma in situ (DCIS) vary according to mammography screening interval and clinical risk factors? Findings For this cohort study, a well-calibrated model was developed to predict cumulative 6-year risk of screen-detected DCIS in 916 931 women. Compared with women undergoing biennial mammography, those undergoing annual mammography had a 40% to 45% higher 6-year cumulative risk of screen-detected DCIS, whereas those undergoing triennial mammography had lower risk. Meaning This risk model provides estimates of the 6-year probability of screen-detected DCIS and can inform discussions of screening benefits and harms for those considering a screening interval other than biennial.

weight was assigned to each covariate combination, which is the frequency of this covariate combination across all 20 imputed datasets. This base weight ( 0 ) was further adjusted to reflect the US population of women, by weighting based on age, race/ethnicity, and first degree family history of breast cancer: ( , , ℎ , )= 0 ( , , ℎ , ) × (  ,  ) ×  (  ℎ  |  ,  ), where ( , ) is the proportion of women with this age and race in US population, ( ℎ | , ) is the proportion of women with or without family history within this age and race subgroup in US population, and includes other covariates. The age and race/ethnicity distribution of the U.S. 2016 population was estimated from US census data. 4 The percentage of women with a first-degree family history by age and race/ethnicity was estimated from the 2015 NHIS. 5 Evaluation of discriminatory accuracy: AUC was estimated using 5-fold cross-validation. Clustered randomization was used to ensure that all screens for a single woman were assigned to the same subset of the 5 cross-validation folds and that the same assignment was applied to all imputed datasets. The screening exams across 4 subsets were used as a training dataset to obtain fitted models (20 fitted models due to 20 imputed datasets) while the remaining 1 subset was used to obtain predicted risk and calculate AUC for validation. The multiple imputation process was performed for each cross-validation training dataset (i.e., the multiple imputation process was repeated 5 times). Predicted risk was estimated for each covariate and outcome combination in the validation subset and the final predicted risk was obtained as the average of the predicted risks based on the 20 imputed datasets. To map back to the validation population, a weight was assigned to each covariate and outcome combination as its proportion in this validation data (within each of the 20 imputed validation subsets) for AUC calculation, which led to 20 AUCs in each validation subset.
The final AUC was the average of the 5×20 AUCs in 5 validation subsets.
To assess overfitting, we compared the AUC from the models fit using the full data (AUCAll) to AUCs from the models fit using above 5-fold cross-validation (AUCCV). The overall risk evaluated was the average risk across the 20 imputed datasets. The estimated AUCAll was the average of 20 AUCs calculated using the overall risk to predict the outcomes in each of the 20 imputed datasets. The variance and the 95% confidence interval of the AUCAll were calculated using Rubin's Rules to account for the variance due to multiple imputation. The difference between them (AUCAll -AUCCV) is called the optimism, 6 which is expected to be small if overfitting is small. To account for the overfitting, the adjusted AUC and its confidence interval were calculated by subtracting the optimism from AUCAll and its confidence limits: (CILower (Adj), CIUpper (Adj))= (CILower (All) -optimism, CIUpper (All) -optimism).
Model calibration: Model calibration was estimated by the ratio of expected to observed numbers of women with screen-detected DCIS, for all women and within risk decile subgroups, using 5-fold cross-validation. Similar to calculating AUC, the multiple imputation process was performed for each cross-validation training dataset. The predicted risk was estimated for each covariate combination in the validation subset and the (fold-specific) final predicted risk was obtained as the average of the predicted risks based on the 20 imputed datasets in each validation subset. Combinations of covariates were classified into categories of low to high risk based on deciles of the predicted risks, where the Note that, the confidence intervals may be conservative (i.e., too narrow) due to ignoring variation between 5 cross-validation folds and variation in E estimates.

Model for risk of competing events
Like risk of screen-detected DCIS, we estimated the risk of competing events (death or invasive cancer) within one/two/three years after annual/biennial/triennial screening using logistic regression with the same study covariates.
For the competing events, some biennial and triennial screens did not have complete capture for 2 and 3 years of follow-up, respectively. Therefore, some biennial and triennial screens had missing values for competing events and were included in imputation models to help impute missing study covariates (see Supplementary Table 5). However, for modeling the risk of competing events, we required complete capture following the screens (2 years for biennial and 3 years for triennial), and hence the screens without complete cancer capture were excluded in building the competing event risk model. For each possible combination of covariates, the final risk was the average of 20 risks based on fitted models of 20 imputed datasets.

Six-year cumulative risk of screen-detected DCIS
Estimate 6-year cumulative risk: The cumulative risk for screen-detected DCIS after six years of annual/biennial/triennial screening was calculated for each covariate combination, fixing the the year of screen at the latest date in the data 01/31/2020, based on the fitted logistic regression models. We used a discrete time survival model to estimate 6-year cumulative risk of screen-detected DCIS, while taking into account censoring and competing risks of other outcomes. 7 Let the covariates at the th screen be denoted ( ) where i =1, …, 6 for annual screening, i =1, 2, 3 for biennial screening, and i =1, 2 for triennial screening. We assume that age increases by 1/2/3 years for each subsequent annual/biennial/triennial screen, i.e., = 1 + ( − 1), ( = 1, 2, 3) while the other covariates remain the same as the first screen, i.e., ( ) = (1) . The cumulative risks after six years of 6 annual screens were estimated by Similarly, the cumulative risks after six years of 3 biennial screens were estimated by and the cumulative risks after six years of 2 triennial screens were estimated by For each possible combination of covariates, 20 cumulative risks were estimated based on 20 fitted models using 20 imputed datasets and averaged to estimate the cumulative risk for that covariate combination.